Geospatial Data in R

Week 1 - Review of R and R Markdown

Prof Josh Merfeld

August 9, 2024

Introduction

Syllabus and contents

My slides

  • Before we get into it, I have put all of my material on the web

  • You can find my slides (along with copy-pasteable code) on my GitHub repository:

    • https://github.com/JoshMerfeld/geospatialdataR
    • Scroll down to the bottom and you’ll find all of the links you’ll need.
    • I’d suggest you have this page open during classes. I’ll sometimes ask you to use something on the repo.

Syllabus

  • Some important dates:

What can we do with
geospatial data?

Geospatial data is everywhere

  • One estimate says that 100 TB of only weather data are generated every single day1
    • This means there is a lot of data to work with!
    • Note that this is also problematic, since it can be difficult to work with such large datasets

  • Geospatial data is used in a variety of fields
    • Agriculture
    • Urban planning
    • Environmental science
    • Public health
    • Transportation
    • And many more!

Geospatial data in my own work

  • I use geospatial data quite a bit

  • Let’s go through some examples!

Poverty mapping

  • One of the things I work on is poverty mapping
    • This is the process of estimating poverty rates at a very granular level
    • This is important for targeting resources and understanding the distribution of poverty
    • I sometimes use geospatial data to do this

  • Why geospatial data?
    • Because it is often the only variable available in less developed countries!
    • Survey data alone isn’t sufficient to do it at a granular level

How does pollution affect different outcomes?

  • How is pollution measured?
    • Weather stations
    • Satellites
    • Image to the right comes from satellite data1
  • I have research on effects of pollution on agriculture and mental health

Construction of roads in India

Some things to note

  • We will be using RStudio throughout the workshops
    • There are other options you are welcome to use (VS Code is the most common alternative)
  • Two general “data cleaning” pipelines:
  • We will be using the tidyverse

Getting started with RStudio

  • Let’s start by looking at the layout of RStudio.

  • For those of you with ample R experience, nothing here will be new!

Why don’t you all give it a try

  • Create a script in RStudio

  • Save that script in a specific place (folder) on your computer

    • Make sure to keep track of where you save it!
    • I create a folder for each specific project I work on
    • e.g. you could create “Nairobi Workshops” and save the script as “day1.R”

First things first: the working directory

  • The working directory is the folder that R is currently working in
    • This is where R will look for files
    • This is where R will save files
    • This is where R will create files
  • You can always write out an entire file path, but this is tedious
    • More importantly, it makes your code less reproducible since the path is specific to YOUR computer

First things first: the working directory

  • One nice thing about R is that the working directory will automatically be where you open the script from
    • Let’s try this. Save your script to a folder on your computer, then open the script from that folder.
    • Let’s see if it worked!
Code
getwd() # this command will show you your current working directory
[1] "/Users/Josh/Dropbox/KDIS/Classes/geospatialdataR"

First things first: the working directory

  • You can also set the working directory in RStudio
    • Session > Set Working Directory > Choose Directory (or Source File Location)
    • Give it a try and let’s see if it worked!
Code
getwd() # this command will show you your current working directory
[1] "/Users/Josh/Dropbox/KDIS/Classes/geospatialdataR"

Always use the same working directory!

  • Make sure to always set the working directory to the same location when working in the same script!

  • This will avoid problems later

    • It also makes your code more reproducible (e.g. if a colleague wants to run it, you just send the entire folder and it works with no changes)

R packages

  • R is a language that is built on packages
    • Packages are collections of functions that do specific things
    • R comes with a set of “base” packages that are installed automatically
  • We are going to use one package consistently, called the “tidyverse”
    • This consists of a set of packages that are designed to work together, with data cleaning in mind

R packages

The one exception to always using a script? I install packages in the CONSOLE. You can install packages like this:

Code
install.packages("tidyverse") # this will install the tidyverse package. Note the quotes!
  • You only need to install a package once on your computer.

R packages

The first thing you’ll do in your script is load packages. You do it like this:

Code
'''
This script is part of the Nairobi Workshop on SAE.
Date: 26 August 2024 (written earlier!)
Author: Josh Merfeld
'''
# Load packages (libraries)
library(tidyverse)
  • Note that the first part is a comment I’ve added to the script.
    • I make a lot of comments!

R Markdown